-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve software exception handling performance #108480
Improve software exception handling performance #108480
Conversation
This should not be needed. We are working on deleting FCThrow. |
src/coreclr/vm/excep.cpp
Outdated
#ifndef DACCESS_COMPILE | ||
// | ||
// Init a new frame | ||
// funcCallDepth: the number of frames to skip when looking for the return address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: funCallDepth vs. funcCallDepth
Do we need this argument at all? It is always 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be passed in > 1 for the follow up change that will modify FCThrow to use this new way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not need to worry about FCThrows. They will be gone soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I have missed your comment on the plan to remove the FCThrow helpers. Then I can simplify it.
I've also measured the performance improvement on Linux under WSL2 on the same machine where I've ran the Windows test. The single threaded results are exactly the same as the ones on Windows, but for multi-threaded case, this change improves the performance 3 times! The same is true for multi-threaded async EH. |
src/coreclr/vm/fcall.h
Outdated
#define FC_CAN_TRIGGER_GC_HAVE_THREAD(thread) | ||
#define FC_CAN_TRIGGER_GC_HAVE_THREADEND(thread) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#define FC_CAN_TRIGGER_GC_HAVE_THREAD(thread) | |
#define FC_CAN_TRIGGER_GC_HAVE_THREADEND(thread) |
These are debug-only macros. We do not need special optimized variants that take Thread*.
(Also, passing around current Thread*
mattered a lot before current thread getter was converted to a regular thread static. It matters a lot less now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, passing around current
Thread*
mattered a lot before current thread getter was converted to a regular thread static. It matters a lot less now
I have measured a noticeable difference in perf when the GetThread was called multiple times in the IL_Throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are debug-only macros. We do not need special optimized variants that take Thread*.
Ok, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, they are actually not debug only, they are defined as empty for !ENABLE_CONTRACTS Right, the contracts are debug only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have measured a noticeable difference in perf when the GetThread was called multiple times in the IL_Throw.
GetThread
should be just a couple of instructions and the C++ compiler should be able to CSE redundant calls. I would not expect that we are at the point where a couple more instructions in IL_Throw makes a noticeable difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, they are actually not debug only, they are defined as empty for !ENABLE_CONTRACTS
ENABLE_CONTRACTS == DEBUG
src/coreclr/vm/fcall.h
Outdated
@@ -799,6 +799,22 @@ LPVOID __FCThrowArgument(LPVOID me, enum RuntimeExceptionKind reKind, LPCWSTR ar | |||
#define HELPER_METHOD_FRAME_GET_RETURN_ADDRESS() \ | |||
( static_cast<UINT_PTR>( (__helperframe.InsureInit(NULL)), (__helperframe.MachineState()->GetRetAddr()) ) ) | |||
|
|||
#define EXCEPTION_METHOD_FRAME_BEGIN(funCallDepth) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think this needs to be wrapped in a macro. Inlining the code to two places where it is needed would be more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be used in more places with the follow up changes - in the FCThrow variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I've missed your comment on the fact that FCThrow
will go away. I am not sure what the plan is though. For example, the JIT_Div
throws uses the FCThrow
when there is division by zero or overflow. And there are other throwing helpers like JIT_ThrowMethodAccessException
, JIT_ThrowFieldAccessException
. And also some throw helpers that are QCALLs now ExceptionNative_ThrowEntryPointNotFoundException that I think could also get this improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the JIT_Div throws uses the FCThrow when there is division by zero or overflow.
We will convert them into the mix of C#, FCalls without HMF, and QCalls as needed. For example, check #102129 that converted multiplication helpers that followed very similar pattern.
And also some throw helpers that are QCALLs now ExceptionNative_ThrowEntryPointNotFoundException that I think could also get this improvement.
Any QCall can throw exception. If we were to improve something about QCalls, it should be a scheme that applies to all QCalls. My preference would be switch QCalls to an ordinary exception marshalling scheme (ie it is what code outside runtime needs to do to marshal exceptions between managed/unmanaged): store the exception on the unmanaged side in thread-local storage and rethrow it on the managed side.
If we care enough about improving perf of specific exception-throwing QCalls, we should just move more of them to C#. There is no point in throwing C++ exception, catching it, and rethrowing it as managed exception. We can throw the managed exception directly from managed code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense, so I have expanded the macros to the two locations they were used in in my last commit.
I am wondering what it would take to get rid of the one remaining unwind. It would require asm helper, but we have that asm helper for native AOT already. Can we re-use it (maybe with some ifdefs for now) to get rid of the one remaining unwind? We are likely going to share some of the native AOT asm helpers with CoreCLR to get some of the HMFs. |
I think it should not be difficult. The asm helper would likely be quite different from the NativeAOT one though, so it doesn't seem like it would make sense sharing it. I'll try to make such change, but I'd prefer doing it as a separate change. |
I have recently discovered that large part of the time spent in the EH while handling software exception is in the RtlLookupFunctionEntry Windows API. That API is called when unwinding native (non-managed) frames on stack. The current implementation of the IL_Throw JIT helper that is used to throw managed exceptions needs three unwinds to get to the managed caller. Due to the two passes of EH, it means there are six calls made to that API per throw. This change reduces that to just a single unwind, which leads to about 12% improvement in single threaded scenarios and about 30% improvement in multi threaded one for a case when an exception is thrown over 10 managed frames and then caught. It also leads to similar improvements in async exception handling performance. An additional benefit of this change is removal of the helper method frame from IL_Throw and IL_Rethrow, which contributes to the current effort of getting rid of the helper method frames. In a follow up changes, I am planning to make similar change to the FCThrow helper and its variants, further reducing the usage of the helper method frames. I have made this change for the new EH only.
9559a8a
to
f941262
Compare
It seems like I've broken something while making the changes based on the PR feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/azp run runtime-coreclr outerloop gcstress0x3-gcstress0xc |
No pipelines are associated with this pull request. |
/azp run runtime-coreclr outerloop, runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 2 pipeline(s). |
* Improve software exception handling performance I have recently discovered that large part of the time spent in the EH while handling software exception is in the RtlLookupFunctionEntry Windows API. That API is called when unwinding native (non-managed) frames on stack. The current implementation of the IL_Throw JIT helper that is used to throw managed exceptions needs three unwinds to get to the managed caller. Due to the two passes of EH, it means there are six calls made to that API per throw. This change reduces that to just a single unwind, which leads to about 12% improvement in single threaded scenarios and about 30% improvement in multi threaded one for a case when an exception is thrown over 10 managed frames and then caught. It also leads to similar improvements in async exception handling performance. On Linux, the multi-threaded gain is much higher, I've measured 3 fold improvement. An additional benefit of this change is removal of the helper method frame from IL_Throw and IL_Rethrow, which contributes to the current effort of getting rid of the helper method frames. I have made this change for the new EH only.
@janvorli quite a few improvements in dotnet/performance, nice |
I have recently discovered that large part of the time spent in the EH while handling software exception is in the
RtlLookupFunctionEntry
Windows API. That API is called when unwinding native (non-managed) frames on stack. The current implementation of theIL_Throw
JIT helper that is used to throw managed exceptions needs three unwinds to get to the managed caller. Due to the two passes of EH, it means there are six calls made to that API per throw.This change reduces that to just a single unwind, which leads to about 12% improvement in single threaded scenarios and about 30% improvement in multi-threaded one for a case when an exception is thrown over 10 managed frames and then caught on Windows. It also leads to similar improvements in async exception handling performance. I have also run the dotnet/performance EH tests and they have shown the same improvements.
An additional benefit of this change is removal of the helper method frame from
IL_Throw
andIL_Rethrow
, which contributes to the current effort of getting rid of the helper method frames.In a follow up change, I am planning to make similar change to the
FCThrow
helper and its variants, further reducing the usage of the helper method frames.I have made this change for the new EH only.